Scalable Fault Tolerance
نویسنده
چکیده
Abst rac t . As communication networks grow, existing fault handling tools become increasingly unaffordable. In many cases the reason is that they involve global measures such as global time-outs or reset procedures, and their cost grows with the size of the network. Rather, for a fault handling mechanism to scale to large networks, it should involve local measures, or, at worse, fault local measures, i.e. measures the cost of which depends only on the number of failed nodes (which, thanks to today's technology, grows much slower than the networks). This decreases the recovery time and, moreover, often allows the non-faulty regions of the networks to continue their operation even during the recovery of the faulty parts. We describe several research ideas that lead in this direction.
منابع مشابه
A Scalable Byzantine Fault Tolerant Service in Grid System
This paper describes the design, implementation and usage of a secure scalable Byzantine fault tolerant MDS system in the Grid. The scalable Byzantine fault tolerant MDS system provides a hierarchy GIIS servers, a local GIIS domain can require the resource it needs from remote GIIS domain. By using the statemachine replication approach and quorum system technique, the scalable Byzantine fault t...
متن کاملJSEB (Java Scalable sErvices Builder): Scalable Systems for Clusters of Workstations
We present a report on JSEB (Java Scalable Service Builder) whose goal is to offer programmers a tool that can be used to efficiently add scalability and fault-tolerance to a replicated service in cluster(s) of workstations.
متن کاملAchieving On-chip Fault-tolerance Utilizing BIST Resources
Widespread reliability challenges are expected for 65nm and below VLSI fabrication technologies. Effective and efficient on-chip fault-tolerance solutions are required to counter reliability challenges. A new postfabrication reconfigurable and scalable approach of achieving on-chip fault-tolerance, using built-in-self-test (BIST) resources, has been proposed. This paper describes the approach a...
متن کاملRun-Through Stabilization: An MPI Proposal for Process Fault Tolerance
The MPI standard lacks semantics and interfaces for sustained application execution in the presence of process failures. Exascale HPC systems may require scalable, fault resilient MPI applications. The mission of the MPI Forum’s Fault Tolerance Working Group is to enhance the standard to enable the development of scalable, fault tolerant HPC applications. This paper presents an overview of the ...
متن کاملScalable and Secure Data Collection: Fault Tolerance Considerations
Data collection, or uploading, is an inherent part of numerous digital government applications. In this poster we present our recent research directions in the development of Bistro, a scalable and secure architecture designed for collection of data over the Internet for digital government applications.
متن کاملLarge-Scale Computation Not at the Cost of Expressiveness
We present Celias, a new concurrent programming model for data-intensive scalable computing. Celias supports many virtues commonly found in existing distributed programming frameworks, such as elastic scaling and fault tolerance, without sacrificing expressiveness. The key design idea of Celias is the concept of a microtask, as a scalable, fault-tolerant, and completely data-driven unit of comp...
متن کامل